Deep Exploration via Bootstrapped DQN
نویسندگان
چکیده
Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as -greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games.
منابع مشابه
Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation
In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on ε-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches such...
متن کاملBayes By Backprop Neural Networks for Dialogue Management
In dialogue management for statistical spoken dialogue systems, an agent learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful dialogue policy estimation. Current deep reinforcement learning methods are very promising but rely on ε-greedy exploration, which is not as sample efficient as methods that use uncertainty estimates,...
متن کاملWhatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger
Recent developments have established the vulnerability of deep Reinforcement Learning (RL) to policy manipulation attacks via adversarial perturbations. In this paper, we investigate the robustness and resilience of deep RL to training-time and test-time attacks. Through experimental results, we demonstrate that under noncontiguous trainingtime attacks, Deep Q-Network (DQN) agents can recover a...
متن کاملHierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. The primary difficulty arises due to insufficient exploration, resulting in an agent being unable to learn robust value functions. Intrinsically motivated agents can explore new behavior for its own sake rather than to directly solve problems. Such intrinsic behaviors...
متن کاملHow to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies
Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity such as [1]. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increase...
متن کامل